Pseudorandom generators are used ubiquitously in cryptography in order to overcome the deterministic limitations of computers and generate good enough pseudorandomness from true randomness.
An algorithm which fulfils the task of generating more bits from a smaller number of bits is called a generator.
A *generator* is an efficient algorithm $\textit{Gen}(x: str[X]) \to str[R]$ where $R\gt X$ which takes a binary string as an input and produces a longer binary string as an output.
A generator which takes a short string of random bits, called a seed, and expands them into a larger string of pseudorandom bits is called a pseudorandom generator (PRG).
A (secure) *pseudorandom generator* $\textit{PRG}(seed: \textbf{str}[S]) \to \textbf{str}[R]$ is a generator such that for every input, called a *seed*, $s \in \{0,1\}^S$ and every efficient statistical test $\textit{ST}: \{0,1\}^R \to \{0,1\}$, the output $\textit{PRG}(s)$ is *pseudorandom*, i.e. it holds that
$$\left|\Pr[\textit{ST}(\textit{PRG}(s)) = 1] - \Pr_{r \leftarrow_R \mathcal{R}}[\textit{ST}(r) = 1]\right| \le \epsilon(R)$$
for some negligible $\epsilon(R)$.
The set $\mathcal{S} \coloneqq \{0,1\}^S$ is called the *seed space* and the set $\mathcal{R} \coloneqq \{0,1\}^R$ is called the *output space*.
This definition tells us that an algorithm $\textit{PRG}$ which takes a uniformly chosen binary string of length $S$ (i.e. "truly random" string), called a *seed*, and outputs a longer binary string of length $R$, is a pseudorandom generator if there is no efficient statistical test which can distinguish between $\textit{PRG}$'s output and a string chosen uniformly at random from the output space $\mathcal{R}$ with non-negligible probability.
Essentially, the definition says that the probability that any statistical test thinks a string generated by $\textit{PRG}$ is random is approximately equal to the probability that the same statistical test thinks a string uniformly chosen from $\mathcal{R}$ is random, i.e.
$$\Pr_{s\leftarrow_R \mathcal{S}}[\textit{ST}(\textit{PRG}(s)) = 1] \approx \frac{1}{|\mathcal{R}|}$$
It does not matter if you understand the nitty-gritty details of this definition for the security of a pseudorandom generator because it is one of the most useless pieces of information you will encounter in your lifetime. The reason for this is that there is no known PRG which has been proven to satisfy this definition because being able to prove it means that one is able to prove that
Nevertheless, it gives us an idealised model for what a secure PRG should be.
We can derive some properties from the definition of a PRG which can hint that a candidate PRG is secure and can be trusted.
A secure PRG is *unpredictable* in the sense that there is no algorithm which given the first $i$ bits of the output of $\textit{PRG}$ can guess what the $i+1$ bit would be with probability that is non-negligibly greater than $\frac{1}{2}$. Similarly, an unpredictable PRG is secure.
We are given a secure PRG $G(seed: str[S]) \to str[R]$ and need to prove that it is unpredictable. Suppose, towards contradiction, that $G$ is predictable, i.e. there exists is an index $i$ and an efficient algorithm $\mathcal{A}$ which when given the bits $y[0], y[1], ..., y[i]$ of the output of $\textit{G}$ can guess the bit $y[i+1]$ with probability $\alpha \ge \frac{1}{2} + \xi(R)$ for some non-negligible $\xi$ (yes, even if the algorithm works for a single position in the output, then the PRG is predictable). We define the following statistical test (or distinguisher)
$$D(y) = \begin{cases}1, \text{ if } \mathcal{A}(y[0],y[1],...y[i]) = y[i+1] \\ 0, \text{ otherwise}\end{cases}$$
Essentially, $D$ outputs 1 if the algorithm $\mathcal{A}$ guesses correctly. If the string $y$ is chosen from a uniform distribution, i.e. $y \leftarrow_R \{0,1\}^R$, then the algorithm $\mathcal{A}$ has no information and cannot guess the bit $y[i]$ with any probability better than $\frac{1}{2}$. On the other hand, if $y$ is generated by $G$, then the algorithm $\mathcal{A}$ can guess $y[i]$ with probability $\alpha \ge \frac{1}{2} + \xi(R)$ which means that there is a statistical test which can differentiate between a string generated by $G$ and a truly uniformly chosen string which contradicts the original assumption that $G$ is secure.
For the other direction, we are given a generator $G(seed: str[S]) \to str[R]$ that is unpredictable for all positions $i \in \{0,1,...,R - 1\}$. We want to prove that $G$ is secure. We will denote by $G_iU_{R-i}$ a string of which the first $i$ bits were generated using $G$ and the rest $R-i$ bits were chosen according to a uniform distribution. We denote the distribution of such strings as $H_i$. It is clear that $H_0$ is the uniform distribution $U_R$ and $H_R$ is $\{G(s)\}_{s\leftarrow_R \mathcal{S}}$. We need to show that $H_{i-1} \approx H_i$ for all $i$.
Suppose, towards contradiction, that $H_{i-1} \not\approx H_i$ for some $i$,, i.e. there is a distinguisher $D$ such that
$$\left|\Pr_{x\sim H_{i-1}}[D(x) = 1] - \Pr_{x\sim H_i}[D(x) = 1]\right| \gt \xi(R)$$
for some non-negligible $\xi(R)$.
TODO
Unfortunately, these two properties only provide a potential way to rule out an PRG as insecure. Proving that a PRG is unpredictable equally as difficult as proving that a PRG is secure, since it is essentially an equivalent definition for the security of a PRG.
At the end of the day we just assume that secure generators exists. In fact, we have many PRGs that we believe to be secure but are just unable to prove it. Similarly, we have many PRGs that have been shown to be insecure and should not be used. So really, we consider a PRG to be secure until someone comes along and shows a way to break it. Since we have no better alternative, i.e. we do not know how to prove that a PRG is secure, we are forced to take the leap of faith and make-do with what we have.
Nevertheless, in order to be as safe as possible, one needs to make as few assumptions as possible and indeed that is what cryptography does. The only assumption regarding the existence of secure PRG which cryptography makes is the following.
There exists a secure $\textit{PRG}(seed: str[S]) \to str[S+1]$ which takes a seed of length $S$ and produces a pseudorandom string with length $S + 1$.
This assumption has neither been proven nor refuted, however there is a lot of evidence supporting it (and it better be true because cryptography falls apart otherwise). Okay, but this assumption in itself does not seem particularly helpful, for it only allows us to produce a pseudorandom string which is one bit longer than its random seed - we have really only gained 1 bit of randomness. Fortunately, it turns out that if we assume this to be true, this PRG can actually be used to construct a new PRG which takes a seed of length
Let's see how we can do this. We are given a pseudorandom generator
And here is a
fn GenT(seed: str[S]) -> str[T] {
let y: str[T]; // Initialise the output y
let current_seed = seed;
let i = 0;
while i < T {
let pseudorandom_str = Gen(current_seed); // Get the output of Gen from the current seed
y[i] = pseudorandom_str[S]; // Use the last bit of Gen's output for the current bit of the output y; the last bit is at index (S + 1) - 1 = S
current_seed = pseudorandom_str[0..S] // The new seed is the output of Gen without the last bit
}
return y;
}
This algorithm provides us with a generator that can produce a string of any length
We are given the algorithm $\textit{GenT}(seed: str[S]) \to str[T]$ with seed space $\mathcal{S} = \{0,1\}^S$ and we need to prove that it is secure.
Let's introduce some notation. We denote by $U_i\textit{Gen}_{T-i}$ a string whose first $i$ bits were chosen according to the uniform distribution over $\{0,1\}^{i+1}$ and whose remaining $T-i$ bits were generated by the same algorithm that $\textit{GenT}$ uses with some seed $s_i \leftarrow_R \mathcal{S}$. We denote by $H_i$ the distribution of strings generated in this way. Therefore, $H_0$ is the distribution obtained by sampling the seed space and outputting only $\textit{GenT}$, for there are no bits chosen from a uniform distribution in this case. Conversely, $H_T$ denotes the uniform distribution over $\{0,1\}^T$ because no bits are generated by the same algorithm that $\textit{GenT}$ uses.
We need to show that $H_0 \approx H_T$ which can be done by using the [Randomness](Security%20Notions/Randomness.md#admonition-theorem-triangle-inequality-for-computational-indistinguishability) and just showing that $H_i \approx H_{i+1}$ for all $i \in \{0,1,...,T-1\}$.
Suppose, towards contradiction, that there is a statistical test $D(y: str[T]) \to \textit{bit}$ and some $i$ such that
$$\left|\underset{y \sim H_i}{\mathbb{E}}[D(y) = 1] - \underset{y \sim H_{i+1}}{\mathbb{E}}[D(y) = 1]\right| \ge \epsilon$$
for some non-negligible $\epsilon$.
We now construct an algorithm $D'(y: str[S+1]) \to bit$ which will interpret the $y$ as an output of $\textit{Gen}(s_i)$ at some stage which used a seed which we will call $s_i$. This output is comprised of a seed for the next stage, i.e. $s_{i+1} \coloneqq y[0]y[1]\cdots y[S-1]$, and one output bit, $y[S]$. Subsequently, $D'$ generates a string $z$ of length $T$. The bits $z[0],z[1],...,z[S-1]$ are chosen according to a uniform distribution, the algorithm then copies the bit $y[S]$ into $z[S]$ and finally it generates the bits $z[S+1],z[S+2],...,z[T-1]$ by using the same process as $\textit{GenT}$ does, utilising $s_{i+1}$ as the initial seed. At the end, $D'$ will simply return $D(z)$.
```rust
fn D'(y: str[S+1]) -> bit {
let z: str[T];
for (let j = 0; j < S; ++j) { // i is the constant for which we assume that H_i is distinguishable from H_(i+1)
z[j] = random_bit(); // Initialise the first i bits, i.e. z[0],...,z[i-1] to uniformly random values
}
z[S] = y[S]; // copy the i-th bit from y
let current_seed = y[0..S]; // Interpret the first i bits of y as the initial seed
for (let j = S + 1; j < T; ++) { // Execute the same algorithm as GenT to generate the remaining bits of z
let pseudorandom_str = Gen(current_seed);
z[j] = pseudorandom_str[S];
current_seed = pseudorandom_str[0..S]
}
return D(z); // Return whatever value D will give for the string z;
}
```
If $D$ is efficient, then $D'$ is also clearly efficient, so we need not worry about this anymore. Now, if $y$ is chosen according to a uniform distribution, then $D'$ will feed into $D$ the string $z$ which would be distributed according to $H_{i+1}$ with $i = S$, since the bits $z[0],z[1],...,z[i-1]$ are generated according to a random distribution, the bit $z[i]$ is copied from $y[i]$, which was in itself chosen according to a random distribution, and the rest of the bits are generated also generated by $\textit{Gen}$. On the other hand, if $y = \textit{Gen}(s)$ was the output of $\textit{Gen}$ for some seed $s$, then $D'$ will feed into $D$ the string $z$ which would distributed according to $H_i$, since the bits $z[0],z[1],...,z[i-1]$ are generated according to a random distribution, $z[i]$ is copied from $y[i]$, which was generated by $\textit{Gen}$, and the rest of the bits are generated by $\textit{Gen}$. Under our assumption it follows that
$$\left|\underset{x \leftarrow_R \{0,1\}^S}{\mathbb{E}}[D'(z \coloneqq \textit{Gen}(x)) = 1] - \underset{z\leftarrow_R \{0,1\}^T}{\mathbb{E}}[D'(z) = 1]\right| \ge \epsilon$$
But this contradicts the security of $\textit{Gen}$.
One might think that $T \lt 2^S$ is also a requirement, because otherwise the algorithm $\textit{GenT}$ will execute more than $2^S$ steps and would thus require more than $2^S$ seeds for all these steps which means that it will start repeating seeds, thus making it predictable. However, the requirement that $T$ is polynomial takes care of that - for a given $S$, the constants required to make the polynomial $p(S)$ greater than $2^S$ are so ridiculously huge and grow so mind-bogglingly fast that they can be considered infinite. Besides, it is unlikely that you want to produce a googol bits from a 128-bit seed.